Disease association tests by inferring ancestral haplotypes using a hidden markov model

نویسندگان

  • Shu-Yi Su
  • David J. Balding
  • Lachlan James M. Coin
چکیده

MOTIVATION Most genome-wide association studies rely on single nucleotide polymorphism (SNP) analyses to identify causal loci. The increased stringency required for genome-wide analyses (with per-SNP significance threshold typically approximately 10(-7)) means that many real signals will be missed. Thus it is still highly relevant to develop methods with improved power at low type I error. Haplotype-based methods provide a promising approach; however, they suffer from statistical problems such as abundance of rare haplotypes and ambiguity in defining haplotype block boundaries. RESULTS We have developed an ancestral haplotype clustering (AncesHC) association method which addresses many of these problems. It can be applied to biallelic or multiallelic markers typed in haploid, diploid or multiploid organisms, and also handles missing genotypes. Our model is free from the assumption of a rigid block structure but recognizes a block-like structure if it exists in the data. We employ a Hidden Markov Model (HMM) to cluster the haplotypes into groups of predicted common ancestral origin. We then test each cluster for association with disease by comparing the numbers of cases and controls with 0, 1 and 2 chromosomes in the cluster. We demonstrate the power of this approach by simulation of case-control status under a range of disease models for 1500 outcrossed mice originating from eight inbred lines. Our results suggest that AncesHC has substantially more power than single-SNP analyses to detect disease association, and is also more powerful than the cladistic haplotype clustering method CLADHC. AVAILABILITY The software can be downloaded from http://www.imperial.ac.uk/medicine/people/l.coin.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ancestry Inference in Complex Admixtures via Variable-Length Markov Chain Linkage Models

Inferring the ancestral origin of chromosomal segments in admixed individuals is key for genetic applications, ranging from analyzing population demographics and history, to mapping disease genes. Previous methods addressed ancestry inference by using either weak models of linkage disequilibrium, or large models that make explicit use of ancestral haplotypes. In this paper we introduce ALLOY, a...

متن کامل

Haplotype Inference Using a Hidden Markov Model with Efficient Markov Chain Sampling

Knowledge of haplotypes is useful for understanding block structures of the genome and finding genes associated with disease. Direct measurement of haplotypes in the absence of family data is presently impractical. Hence several methods have been developed previously for reconstructing haplotypes from population data. In this thesis, a new population-based method is developed using a Hidden Mar...

متن کامل

Ancestral haplotype-based association mapping with generalized linear mixed models accounting for stratification

MOTIVATION In many situations, genome-wide association studies are performed in populations presenting stratification. Mixed models including a kinship matrix accounting for genetic relatedness among individuals have been shown to correct for population and/or family structure. Here we extend this methodology to generalized linear mixed models which properly model data under various distributio...

متن کامل

Robust estimation of local genetic ancestry in admixed populations using a nonparametric Bayesian approach.

We present a new haplotype-based approach for inferring local genetic ancestry of individuals in an admixed population. Most existing approaches for local ancestry estimation ignore the latent genetic relatedness between ancestral populations and treat them as independent. In this article, we exploit such information by building an inheritance model that describes both the ancestral populations...

متن کامل

Hidden Markov Dirichlet Process: Modeling Genetic Inference in Open Ancestral Space

The problem of inferring the population structure, linkage disequilibrium pattern, and chromosomal recombination hotspots from genetic polymorphism data is essential for understanding the origin and characteristics of genome variations, with important applications to the genetic analysis of disease propensities and other complex traits. Statistical genetic methodologies developed so far mostly ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 24 7  شماره 

صفحات  -

تاریخ انتشار 2008